Machine learning and bioinformatics to identify 8 autophagy-related biomarkers and construct gene regulatory networks in dilated cardiomyopathy

Dilated cardiomyopathy (DCM) is a condition of impaired ventricular remodeling and systolic diastole that is often complicated by arrhythmias and heart failure with a poor prognosis. This study attempted to identify autophagy-related genes (ARGs) with diagnostic biomarkers of DCM using machine learning and bioinformatics approaches. Differential analysis of whole gene microarray data of DCM from the Gene Expression Omnibus (GEO) database was performed using the NetworkAnalyst 3.0 platform. Differentially expressed genes (DEGs) matching (|log2FoldChange ≥ 0.8, p value < 0.05|) were obtained in the GSE4172 dataset by merging ARGs from the autophagy gene libraries, HADb and HAMdb, to obtain autophagy-related differentially expressed genes (AR-DEGs) in DCM. The correlation analysis of AR-DEGs and their visualization were performed using R language. Gene Ontology (GO) enrichment analysis and combined multi-database pathway analysis were served by the Enrichr online enrichment analysis platform. We used machine learning to screen the diagnostic biomarkers of DCM. The transcription factors gene regulatory network was constructed by the JASPAR database of the NetworkAnalyst 3.0 platform. We also used the drug Signatures database (DSigDB) drug database of the Enrichr platform to screen the gene target drugs for DCM. Finally, we used the DisGeNET database to analyze the comorbidities associated with DCM. In the present study, we identified 23 AR-DEGs of DCM. Eight (PLEKHF1, HSPG2, HSF1, TRIM65, DICER1, VDAC1, BAD, TFEB) molecular markers of DCM were obtained by two machine learning algorithms. Transcription factors gene regulatory network was established. Finally, 10 gene-targeted drugs and complications for DCM were identified.

www.nature.com/scientificreports/ mosomal location (CC), and molecular function (MF). The top ten terms of each category were predicted in Table 1.
Based on the number of gene interactions, BP was mainly focused on the regulation of autophagy, positive regulation of autophagy, positive regulation of the cellular catabolic process, and macroautophagy. For cellular components, lysosome and lytic vacuole were significantly associated with autophagy-related differential genes, ultimately pointing to inflammatory cardiomyopathy in response to the human heart. Molecular functional studies revealed that AR-DEGs were most concentrated in low-density lipoprotein particle receptor binding. A similar concentration level could be found in lipoprotein particle receptor blinding and endoribonuclease activity.  www.nature.com/scientificreports/ Notably, the results of the pathway analysis in this study were joint (Table 2). Through the previously set database, the Longevity regulating pathway, Macroautophagy, PI3K-AKT-mTOR signaling pathway, therapeutic opportunities, and AMPK signaling were identified as the top pathways.
A comparison of GO terms was presented in Fig. 5a. Figure 5b provided pathway analysis from multiple databases.
Machine learning screened for autophagy-related biomarkers of DCM. The expression matrices of 23 AR-DEGs were used to construct the best diagnostic model using both least absolute shrinkage selection operator (LASSO) regression and support vector machine recursive feature elimination (SVM-RFE) algorithms to finally obtain potential diagnostic biomarkers of DCM. The LASSO regression algorithm narrowed down the range of AR-DEGs of DCM and obtained 9 variables as potential diagnostic biomarkers for DCM (Fig. 6a). The SVM-RFE algorithm was implemented to identify 13 signature genes (Fig. 6b).

Construction of transcription factor (TF)-gene regulatory network.
Based on the JASPAR TF binding site profile database, TF-gene regulatory network was constructed using the NetworkAnalyst 3.0 platform. The TF-gene regulatory network was constructed based on 8 diagnostic biomarkers of DCM (PLEKHF1, HSPG2, HSF1, TRIM65, DICER1, VDAC1, BAD, TFEB) ( Figure). The network included 46 loci with 76 edges. In detail, these loci are combined by 8 seed genes and 38 transcription factors. TFEB was regulated by 19 transcription factors and DICER1 was regulated by 15 transcription factors. Figure 7 showed the TF-gene regulatory network. cloud/ Enric hr/) web platform was used to identify drug molecules associated with 8 diagnostic biomarkers for DCM. Gene-targeted drugs were collected based on P-values. The combined score is proportional to the gene-drug association when the p-value is satisfied. The analysis showed that Melatonin CTD 00006260 and metformin CTD 00006282 had high gene binding to DCM. Genetic disease association analysis. Gene list enrichments were identified in the DisGeNET dataset.
All genes in the genome had been used as the enrichment background. Terms with a p-value < 0.01, a minimum count of 3, and an enrichment factor > 1.5 (the enrichment factor is the ratio between the observed counts and the counts expected by chance) were collected and grouped into clusters based on their membership similarities. The top 10 enriched clusters were shown in the Fig. 8. The algorithm used here was the same for pathway and

Discussion
It is well known that DCM is impaired ventricular dilation and systolic diastole, leading to arrhythmias and heart failure in severe cases. Unfortunately, with the low prevalence of EMB, most patients with early-stage cardiomyopathy are not effectively treated. The gold standard for myocarditis and DCM is often poor prognosis in cases of concomitant arrhythmias and heart failure 2 . Therefore, early diagnosis, precise evaluation, and therapeutic It is well known that autophagy plays an important role in cancer, neurodegenerative diseases, inflammatory diseases, and cardiac diseases 5 . Among these, autophagy mechanisms are increasingly studied in cardiac diseases, and autophagy plays a crucial role in maintaining typical cardiac structure, function, and therapy 16,17 . Two key autophagy-related molecules, mTOR and Beclin1, had been shown to play a regulatory role in myocardial ischemia-reperfusion injury 17 . Among them, mTOR is involved in the PI3K and Akt pathway to regulate myocardial ischemia/reperfusion-induced apoptosis and autophagy 18 . In addition, Beclin1 exerts a positive impact on myocardial ischemia and an adverse effect during myocardial ischemia/reperfusion 19 . Currently, studies on the role of autophagy in cardiomyopathy-related diseases are increasing 13 , and research has shown that damage to the autophagic lysosomal pathway (ALP) and activation of inflammatory vesicles were important factors contributing to DCM 14 . Improved left ventricular size and cardiac function in mice with DCM deficient in NCOA4 (nuclear receptor coactivator 4, an autophagy-associated gene that mediates ferritin degradation) inhibit free ferrous iron overload and increased lipid peroxidation 20 . Carolina et al. 21 found that autophagy-related genes, such as CALCOCO2 and NRBP2, the former of which regulates the expression of the latter, adversely affected left ventricular function parameters in patients with DCM.
In recent years, the exploration of the diagnostic and prognostic role of genetic biomarkers targeting DCM has been on the rise. For example, CYR61 and APN were identified as two target genes for DCM by gene expression profiling studies in the GSE4172 dataset raw data 22 . It had been shown that RBM20 induced aberrant TNN splicing as a determinant of DCM and increased the risk of arrhythmias 23 . In previous bioinformatics studies, genes or transcription factors such as CTGF, POSTN, CORIN, and FIGF were closely associated with DCM 24 . However, few studies have been conducted on the value of autophagy-related genes in diagnosing DCM.
To the best of our knowledge, this study is the first to investigate the diagnostic role of ARGs in DCM by mining the GEO database and integrating machine learning and bioinformatics approaches. We used the Net-workAnalyst 3.0 platform to deeply analyze the GSE4172 dataset, which compares gene expression in DCM with  www.nature.com/scientificreports/ healthy samples infected by the fine virus B19. Using differential analysis, we obtained 770 DEGs and combined them with the gene set from the autophagy databases to obtain 23 AR-DEGs of DCM. Finally, by machine learning methods such as LASSO regression and SVM-RFE, we obtained 8 (PLEKHF1, HSPG2, HSF1, TRIM65, DICER1, VDAC1, BAD, TFEB) diagnostic biomarkers of DCM. Previous studies showed significant relevance regarding DCM or cardiomyocyte remodeling in the above eight genes. PLEKHF1 (Pleckstrin homology and FYVE domain containing 1) is located in the lysosome and plays a vital role in caspase-independent apoptosis, a process involved in autophagy 25 . In previous studies, PLEKHF1 is a susceptibility gene for several diseases. For example, Qi et al., identified PLEKHF1 as a potential biomarker for diabetic atherosclerosis 26 ; also, PLEKHF1 was shown to be a potential biomarker for chronic graft-versus-host disease, the accuracy of which was confirmed by several clinical independent validation studies 27 . In addition, it had been shown that levosimendan ameliorated myocardial infarction and ventricular remodeling in diabetic rats, and the expression of the gene Plekhf1 received regulation by levosimendan, showing the potential of Plekhf1 as a target gene for myocardial infarction and diabetic cardiomyopathy 28 .
HSPG2 (Heparan sulfate proteoglycan 2) plays an important role in cancer growth, development, and metastasis 29 . Previous studies had shown that HSPG2 was identified in key cardiac-related regions controlled by chromosome 1p36 30 , and related studies had demonstrated that chromosome 1p36 deletion was responsible for cardiovascular malformations and cardiomyopathy 31 , suggesting an important role for HSPG2 in the pathogenesis and prognostic impact of cardiomyopathy 30 . In addition, HSPG2 also plays an independent predictive role in a variety of diseases. For example, HSPG2 was overexpressed in acute myeloid leukemia and can be used as a prognostic biomarker 32 . Recent studies had shown that HSPG2 deficiency was a risk factor for aortic coarctation 33 .
HSF1 (Heat shock transcription factor 1) is a significant heat stress response factor that plays an important role in inhibiting apoptosis and pathological remodeling of cardiomyocytes and is a protective factor for cardiomyocytes. In a previous quantitative transcriptomic analysis, HSF1 was found to be significantly enriched in cardiomyocytes 34 . It had been shown that HSF1 could be isolated by the death trap method, preventing hydrogen peroxide-induced cardiomyocyte death. It was found that overexpression of HSG1 in transgenic mice reduced ischemia-reperfusion-induced cardiomyocyte injury 35 . In the present study, HSF1 expression was lower in the DCM group compared with the healthy control group, which was also consistent with the findings of previous studies. In addition, it had been shown that overexpression of HSF1 in BAG mutation-associated DCM helped to attenuate pathological remodeling of cardiomyocytes and alleviate proteostatic stress 36 . In contrast, recent studies had shown that HSF1 overexpression lead to reduced expression of myofilament localization-associated BAG3. Decreased expression of BAG3 was strongly associated with non-inherited heart failure and was more susceptible in male patients with DCM 37 . Therefore, the study of relevant molecules and pathways targeting HSF1 contributes to our understanding of DCM.
TRIM65 is an E3 ubiquitin ligase involved in the positive regulation of autophagy and was expressed in vascular endothelial cells, located in the cytoplasmic lysate and nucleoplasm. Unfortunately, there are relatively few studies related to TRIM65. From the available literature, it appeared that TRIM65 was mainly involved in proteopathy and ubiquitination regulation to regulate disease progression and as a target for a variety of diseases 38,39 . Interestingly, although fewer studies are addressing the mechanisms associated with TRIM65 and DCM, according to recent studies, TRIM65 was closely linked to the inflammatory vesicle NRLP3 40 , which is known to play a role in a variety of DCM 14 . TRIM65 was associated with antiviral innate immune mechanisms 41 . In addition, it had been shown that TRIM65 regulated VCAM-1 to control inflammatory responses 42 . All these studies point the way to exploring the molecular mechanism of TRIM65 and DCM.
DICER1 is a member of the ribonuclease III (RNaseIII) family and is involved in the production of microR-NAs, which regulate gene expression at the post-transcriptional level and are more frequently studied in oncological diseases 43 . Evidence suggested that DICER deletion resulted in a dramatic decrease in the level of miRNAs it regulates, which led to severe DCM and heart failure in mice, a trend that was also seen in the expression of DICER proteins in diseased populations, implying an important role of DICER family genes in the pathogenesis of DCM 44 . Follow-up studies had shown that microRNAs act as negative regulators of genes and that specific regulation of microRNA expression could inhibit the loss of cardiac function due to DICER deficiency 45,46 , leading to cardioprotection. These studies suggested that endogenous microRNA competitive regulation of DICER family genes will be an essential strategy for gene targeting therapy in DCM.
VDAC (voltage dependent anion channel), including VDAC1 and VDAC2, is a mitochondrial outer membrane pore-forming protein present in all eukaryotes. As a mitochondrial transporter protein, VDAG is mostly expressed in cardiac tissue and has significant tissue specificity 47,48 . It is well known that Ca 2+ played a detrimental role in heart failure and myocardial ischemia/reperfusion, and Ca 2+ overload activated the complex matrix chaperone procyclin D (CypD), which regulated the VDAC1, Grp75, and IP3R1 complex and thus caused damage to cardiomyocytes, whereas inhibition of the CypD, VDAC1, Grp75, and IP3R1 complex could protect cardiomyocytes 49 . Numerous studies had shown 50,51 that regulation of VDAC1 expression through microRNA targeting could regulate mitochondrial function and promoted the release of mitochondrial calcium for cell protection. Furthermore, in DCM mice, the lncRNA H19/miR-675 axis competitively downregulated VDAC1, reducing apoptosis. The above report provides a new strategy to explore the role of VDAC1 in DCM. It was shown that VDAC1 expression was upregulated in the hearts of patients with hypertrophic cardiomyopathy 52 . In the present study, the expression of VDAC1 was also upregulated in samples from patients with DCM. These findings could explain the unique role played by VDAC1 as a target gene for DCM.
BAD (Bcl-2 associated agonist of cell death) often follows Bcl-2 and plays an anti-apoptotic role. In a TNFα-mediated mouse model of DCM in which apoptosis occurs, the expression of BAD was reduced in association with Bcl-2d 53 , which was consistent with the findings of the present study. According to previous studies, BAD played a key role in inducing β-cell apoptosis in Friedreich's ataxia, a neurodegenerative disease closely related to cardiomyopathy and diabetes 54  www.nature.com/scientificreports/ negative regulation and play an important role in cardiovascular diseases, especially in heart failure and cardiac remodeling 55 . Studies had shown that multiple microRNAs played a regulatory role on BCL2 56 and all of them were upregulated in heart failure 57 . As an antagonist of apoptosis, the protective role of BAD and Bcl-2 in the pathogenesis of DCM depended on further studies. TFEB (transcription factor EB), a transcription factor located within the cytoplasmic lysosol (cytosol), is the master gene of the autophagic machinery of lysosomal biogenesis and coordinates the autophagic process, including autophagosome formation, autophagosome-lysosome fusion, and substrate degradation by driving the expression of autophagy and lysosomal genes 58 . According to reports, TEFB expression was highest in 18-weekold fetal heart tissue, with significant tissue specificity 59 . There is growing evidence that TFEB plays an important role in various types of DCM. Lysosomal storage disorders (LSD) lead to cardiac involvement in hypertrophic cardiomyopathy and DCM 60 . Further studies had shown that the Yes-associated protein (YAP) and Feb signaling pathway played a role in LSD disease by eliminating autophagic lysosomes, reducing cell death, and restoring cardiac function 61 . Also, it was found that TFEB deficiency led to cardiomyocyte hypertrophy and DCM causing heart failure 62 . Therefore, the role of TFEB in targeting DCM is extremely significant.
In addition, we performed a functional enrichment analysis of the pathogenesis of DCM and related molecular pathways and found that AR-DEGs of DCM were mainly enriched in autophagy regulatory pathways and cell growth signaling, such as regulation of autophagy, macroautophagy, AMPK signaling pathway, PKB-mediated events, etc. AMPK (Adenosine monophosphate-activated protein kinase) signaling pathway had been reported to be an important intracellular signaling pathway in the heart 63 . As an emerging target recognized for the treatment of heart failure 64 , AMPK plays an important role in regulating cardiomyocyte growth 65 . Numerous studies had shown that the AMPK pathway and its binding autophagy-related pathways played a protective role in the pathological development of cardiomyopathy [66][67][68][69] . These studies have provided ideas to explore the mechanistic studies of autophagy-related DCM. PKB (protein kinase B), also known as serine/threonine kinase Akt, serves as a central node for a variety of biological processes 70 . It had been reported that PKB was involved in protective mechanisms against myocardial ischemia/reperfusion 71 . However, relatively few studies have been conducted on the association of PKB-mediated events with DCM. According to previous studies, Pleiotrophin, a pro-angiogenic factor, was significantly expressed in rat models of myocardial infarction and DCM patients. It is considered that Pleiotrophin protects the myocardium by inhibiting endogenous AKT/PKB activity 72 . In contrast, Alexander et al. found that PKB phosphorylation expression restored cardiac contractility in a zebrafish model of DCM 73 .
In addition, we constructed TF-gene regulatory networks based on 8 autophagy-related genes in DCM and predicted them to target drugs, such as Melatonin and metformin. Studies showed that Melatonin had a better inhibitory effect on left heart dysfunction and ventricular remodeling in DCM rats with cardiorenal syndrome 74 . Metformin was able to partially reverse ventricular remodeling in mice with DCM through an autophagic mechanism 75 . These studies provided a basis and direction for clinical precision targeting therapy and novel drug development in DCM. In addition, we explored the comorbidities associated with DCM, such as fatty liver disease. Some scholars found that 76 NAFLD affected the cardiovascular system through metabolic and inflammatory responses, and also increased the abnormalities of cardiac anatomy including cardiomyopathy 77 . Furthermore, the disease pathways between the two need further investigation.
However, there are certain shortcomings in our study. First, our data set of DCM was mined and analyzed secondarily by bioinformatics means, and the results of the study need to be validated with external evidence. In addition, the results of this study need to be combined with single-cell sequencing as the multi-omics study progresses. Finally, the mechanism of action and interrelationship between these 8 DCM genes and autophagyrelated genes need further investigation.

Methods
Dilated cardiomyopathy dataset acquisition. The dataset of DCM was downloaded from the GSE4172 dataset of the Gene Expression Omnibus (GEO) (https:// www. ncbi. nlm. nih. gov/ geo) database, which was contributed by Wittchen et al. 22 , piggybacked on the GPL570 [HG-U133_Plus_2] platform using Affymetrix Human Genome U133 Plus 2.0 Array, containing eight endomyocardial myocardial biopsy samples from patients with microvirus B19-associated cardiac inflammation as experimental group and four healthy human samples as a control group. Clinical information of patients from the GSE4172 dataset was presented in Table 4.
Autophagy genes acquisition. A total of 232 autophagy genes were downloaded from the Human Autophagy Database (HADb, http:// autop hagy. lu/). Similarly, 796 autophagy genes were obtained from the Human Autophagy Modulator Database (HAMdb, http:// hamdb. scbdd. com) 78 . A total of 803 autophagy-related genes were obtained as the autophagy gene set for this study by taking the intersection of the two.
Identification of differentially expressed genes (DEGs) in autophagy-related genes (ARGs). NetworkAnalyst 3.0 is a user-friendly bioinformatics visualization web platform for transcriptome analysis, gene network construction, and meta-analysis of gene expression data 79 . The expression data and grouping information of the GSE4172 dataset were submitted to NetworkAnalyst 3.0 for identification of the DCM groups and the healthy control groups for DEGs. For mRNA in microarrays, the threshold was set to |log2FoldChange|≥ 0.8 with a p value < 0.05, and genes meeting this criterion were considered as DEGs. We used the ggplot2 package (R package version 4.1.3) and pheatmap package (R package version 4.1.3) to draw the asymptotic volcano map and heatmap to show the DEGs. Autophagy-related genes (ARGs) and DEGs from the GSE4172 dataset were taken to intersect to obtain the set of autophagy-related differentially expression genes (AR-DEGs). Venn plots were created by using the Omicshare online tool (https:// www. omics hare. com/). The expression of 23 AR-DEGs in GSE4172 was demonstrated using box plots through the ggpubr package as well as Functional enrichment analysis. Functional enrichment consists of performing biological processes, molecular functions, and chromosomal location analysis 80 . Gene annotation uses gene ontology (GO) terminology and consists of biological processes, molecular functions, and cells. The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway was used to understand metabolic pathways and plays an important role in the gene annotation process 81,82 . In addition, the BioCarta, WikiPathways 83 , and Reactome 84 databases were also used to analyze KEGG pathways. The Enrichr (https:// amp. pharm. mssm. edu/ Enric hr/) platform provides a comprehensive gene enrichment analysis applied databases containing rich gene set annotation, pathway information analysis, and screening of gene target drugs 85,86 . The GO terms of the AR-DEGs of DCM and all pathway information for this study were obtained from the Enrichr platform.

Machine learning identifies molecular markers of AR-DEGs in DCM.
In this study, the least absolute shrinkage and selection operator (LASSO) logistic regression was used for feature gene selection to reduce the number of genes in the disease prediction model, solve the multicollinearity problem in the regression analysis, and screen the molecular markers of DCM genes 87 . The "glmnet" package was used to implement the LASSO regression algorithm with α set to 1 which was used to control the traits of the model when dealing with highly correlated data. In addition, the Support Vector Machine-Recursive Feature Elimination (SVM-RFE) algorithm model was also used in this study to characterize the AR-DEGs and remove irrelevant genes to make the diagnostic prediction model more robust 88 . The SVM-RFE was implemented by the e1071 Package R software.
Transcription Factor (TF)-gene regulatory network construction. The JASPAR (http:// jaspar. gener eg. net/) database was used to generate a visual analysis of the TF-gene co-regulatory network 89 . Based on 8 biomarkers of DCM, TFs that regulated the activity of functional pathways and gene expression levels in DCM were identified from the JASPAR database to form the TF-gene regulatory network. It is important to note that the JASPAR database is included in the NetworkAnalyst 3.0 platform.
Target drug screening. Gene target-based drug screening has become a new approach for drug molecular identification study, which helps to expand the scope of relevant drugs and reduce the process of drug development. In this study, molecular markers of DCM were screened for drug candidates through the drug Signatures database (DSigDB), which consists of 17,389 drugs and 19,531 genes associated with the drugs 90 . The DSigDB database can be accessed by visiting Enrichr (https:// www. amp. pharm. mssm. edu/ Enric hr/) website to enter relevant gene targets and download target drug information. Drugs with p-values less than 0.05 and with larger combined scores were considered to be typically significant. The combined score represents the degree to which the small molecule drug is closely linked to the gene of interest.
Genetic disease association analysis. The DisGeNET (http:// www. disge net. org) database is an open and versatile platform for studying specific human diseases and their comorbidities through genetic and molecular pathways, probing the characteristics of disease genes and offering the possibility to elucidate the mechanisms of disease 91 . In the present study, molecular markers of DCM were uploaded to the Metascape (https:// metas cape. org/ gp/ index. html#/ main/ step1) platform 92 , which contains the DisGeNET database. We have revealed DCMrelated comorbidities through the DisGeNET database, laying the foundation for the mechanistic study of DCM.
Copyright permission of KEGG. We have contacted Kanehisa Laboratories. We do not directly use these KEGG Pathway map "images" in the article, we need not obtain copyright permission of KEGG. However, they believe that we have written our article using their data, they kindly ask us to cite the following articles in it 81,93,94 .

Data availability
The dataset GSE4172 for this study can be found in the GEO database (https:// www. ncbi. nlm. nih. gov/ geo). All data generated or analysed during this study are included in this published article. www.nature.com/scientificreports/